arXiv:1506.04779vl  [math.NA]  15Jun2015 


Orthogonal  Matching  Pursuit  under  the  Restricted 

Isometry  Property  * 

Albert  Cohen,  Wolfgang  Dahmen,  and  Ronald  DeVore 

June  17,  2015 


Abstract 

This  paper  is  concerned  with  the  performance  of  Orthogonal  Matching  Pursuit 
(OMP)  algorithms  applied  to  a  dictionary  T>  in  a  Hilbert  space  H.  Given  an  element 
/  €  7~L,  OMP  generates  a  sequence  of  approximations  fn,  n  =  1,2,...,  each  of  which 
is  a  linear  combination  of  n  dictionary  elements  chosen  by  a  greedy  criterion.  It  is 
studied  whether  the  approximations  fn  are  in  some  sense  comparable  to  best  n  term 
approximation  from  the  dictionary.  One  important  result  related  to  this  question 
is  a  theorem  of  Zhang  [8]  in  the  context  of  sparse  recovery  of  finite  dimensional 
signals.  This  theorem  shows  that  OMP  exactly  recovers  n-sparse  signal,  whenever 
the  dictionary  T>  satisfies  a  Restricted  Isometry  Property  (RIP)  of  order  An  for  some 
constant  A,  and  that  the  procedure  is  also  stable  in  i 2  under  measurement  noise. 

The  main  contribution  of  the  present  paper  is  to  give  a  structurally  simpler  proof  of 
Zhang’s  theorem,  formulated  in  the  general  context  of  n  term  approximation  from  a 
dictionary  in  arbitrary  Hilbert  spaces  Namely,  it  is  shown  that  OMP  generates 
near  best  n  term  approximations  under  a  similar  RIP  condition. 

AMS  Subject  Classification:  94A12,  94A15,  68P30,  41A46,  15A52 

Key  Words:  Orthogonal  matching  pursuit,  best  n  term  approximation,  instance  opti¬ 
mality,  restricted  isometry  property. 

1  Introduction 

Approximation  by  sparse  linear  combinations  of  elements  from  a  fixed  redundant  family  is 
a  frequently  employed  technique  in  signal  processing  and  other  application  domains.  We 
consider  such  problems  in  a  separable  Hilbert  space  H  endowed  with  a  norm  ||  ■  ||  :=  ||  •  ||% 
induced  by  the  scalar  product  (■,  •)  on  H  x  %.  A  countable  collection  D  =  {yyj7g r  C  % 
is  called  a  dictionary  if  it  is  complete,  i.e. ,  the  set  of  finite  linear  combinations  of  elements 
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of  the  dictionary  are  dense  in  71.  The  simplest  example  of  a  dictionary  is  the  set  of 
elements  of  a  fixed  basis  of  T~L.  But  our  primary  interest  is  in  redundant  families.  In  such 
a  case,  there  exists  a  strict  subset  of  V  that  is  still  a  dictionary.  A  primary  example  of 
a  redundant  dictionary  is  a  frame,  e.g.,  any  union  of  a  finite  number  of  bases.  Without 
loss  of  generality  we  shall  always  assume  that  the  dictionary  T>  is  normalized ,  i.e. , 

IKII  =  !>  7  e  r. 

Given  such  a  dictionary  T>,  we  consider  the  class 

£n  =  Sn('P)  :=  |  c7(/?7  :  #(S)  <  nj  C  U,  n  >  1.  (1.1) 

7  SS 

The  elements  in  £n  are  said  to  be  sparse  with  sparsity  n.  We  define 

Vn(f)n  ■=  inf  HZ-sll, 

g&n 

which  is  called  the  error  of  best  n-term  approximation  to  /  from  the  dictionary  T>. 

An  important  distinction  between  n  term  dictionary  approximation  and  other  forms 
of  approximation,  such  as  approximation  from  an  n  dimensional  space,  is  that  the  set 
£n  is  not  a  linear  space  since  the  sum  of  two  elements  in  £n  is  generally  not  in  £n, 
although  it  is  in  £2n-  Thus  n-term  approximation  from  a  dictionary  is  an  important 
example  of  nonlinear  approximation  f3]  that  reaches  into  numerous  application  areas 
such  as  adaptive  PDE  solvers,  image  encoding,  or  statistical  learning.  It  also  serves  as 
a  performance  benchmark  in  compressed  sensing  that  better  captures  the  robustness  of 
compressed  sensing  than  results  on  exact  sparsity  recovery  [2j. 

While  there  are  many  themes  in  n  term  dictionary  approximation,  our  interest  here  is 
in  analyzing  the  performance  of  greedy  algorithms  for  generating  n-term  approximations 
to  a  given  target  element  /  G  %.  There  are  numerous  papers  on  this  subject.  We 
refer  the  reader  to  the  survey  article  [6]  as  a  general  reference.  Our  particular  interest 
is  in  understanding  what  properties  of  the  dictionary  T>  guarantee  that  these  algorithms 
perform  similarly  to  best  n-term  approximation. 

These  algorithms  and  best  n-term  approximation  have  a  simple  description  when  the 
dictionary  Z>  is  an  orthonormal  or,  more  generally,  a  Riesz  basis  of  "H.  In  this  case,  the 
best  n-term  approximations  to  a  given  /  G  H  are  realized  by  expanding  /  in  terms  of  the 
basis 

/  =  (L2) 

7er 

and  retaining  n  terms  from  this  expansion  which  correspond  to  the  largest  (in  abso¬ 
lute  value)  expansion  coefficients.  The  typcial  greedy  algorithm  will  construct  the  same 
approximations.  The  situation  is  much  less  clear  when  dealing  with  more  general  dictio¬ 
naries. 

In  the  case  of  general  dictionaries,  algorithms  for  generating  n-term  approximations 
are  typically  built  on  some  form  of  greedy  selection 

Vk-=V 7fc,  k  =  1,2,...,  (1.3) 


2 


of  elements  from  V  and  then  using  a  linear  combination  of  <pi,...,(pn  as  the  n-terrn 
approximation.  The  standard  greedy  algorithm  (called  the  Pure  Greedy  Algorithm)  makes 
the  initial  selection  ipi  as  any  element  such  that 

<Pi  :=  Argmax|(/,v?)|.  (1.4) 

ip£T> 

This  gives  the  approximation  f\  :=  to  /  and  the  residual  r'i  :=  /  —  fk. 

Given  that  (pi, . . . ,  ipk-i  have  been  selected,  and  an  approximation  fk_\  from  Fk_\  :  = 
span{<£q, . . . ,  ipk_ 1}  has  been  constructed,  the  next  dictionary  element  tpk  is  chosen  as  the 
best  match  of  the  residual 

rk- 1  :=  /  -  fk-i,  (1-5) 

iu  the  sense  that 

<pk  :=  Argmax|(rfc_i,<p7)|.  (1.6) 

7er 

There  exist  different  ways  of  forming  the  next  approximation  fk  resulting  in  different 
greedy  algorithms.  We  focus  our  attention  on  Orthogonal  Matching  Pursuit  (OMP),  which 
forms  the  new  approximation  as 

fk  :=  Pkf,  (1-7) 

where  Pk  is  the  orthogonal  projector  onto  Fk.  OMP  is  also  called  the  Orthogonal  Greedy 
Algorithm.  More  generally,  we  analyze  the  Weak  Orthogonal  Matching  Pursuit  (WOMP) 
where  the  choice  of  (pk  is  only  required  to  satisfy 

\(rk-i,ipk)\  >  remax|(rfc_i,  <p7)|,  (1.8) 

7Gr 

where  k  g]0,  1]  is  a  fixed  parameter,  which  is  a  more  easily  implemented  selection  rule  in 
practical  applications.  Once  this  choice  of  <pi, ...  ,(pk  is  made,  then  fk  is  again  defined  as 
the  orthogonal  projection  onto  Fk. 

The  main  interest  of  the  present  paper  is  to  understand  what  properties  of  a  dictionary 
T>  guarantee  that  the  approximation  rate  of  WOMP  after  O(n)  steps  is  comparable  to  the 
the  best  ri-term  approximation  error  crn(f),  at  least  for  a  certain  range  n  <  N .  A  related 
question,  but  less  demanding,  is  to  understand  when  WOMP  is  guaranteed  to  exactly 
recover  /  whenever  /  6  E„  in  O(n)  steps  for  a  suitable  range  of  n.  This  is  sometimes 
refered  to  as  sparse  recovery.  Of  course,  as  already  mentioned,  we  know  that  both  of  these 
questions  have  a  positive  answer  for  the  entire  range  of  n  whenever  T>  is  a  Riesz  basis  for 

n. 

To  give  a  precise  formulation  of  the  type  of  performance  we  seek,  we  define  the  concept 
of  instance  optimality. 

Instance  Optimality:  We  say  that  the  WOMP  algorithm  satisfies  instance  optimality 
for  n  <  N ,  if  there  are  constants  A,C  >  0,  with  A  an  integer,  such  that  the  outputs  fn 
of  WOMP  satisfy 

11/  -  /Un||  <  Can(f)n,  (1-9) 

for  n  <  N . 
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Notice  that  if  (II. 9p  is  satisfied  then  it  implies  a  positive  solution  to  the  sparse  recovery 
problem  for  the  same  range  of  n  since  crn(/)  =  0  when  /  is  in  En.  To  obtain  results  on 
sparse  recovery  or  instance  optimality  requires  structure  on  the  dictionary  T>.  The  first 
results  of  this  type  were  obtained  under  assumptions  on  the  coherence  of  a  dictionary 
V  cH  defined  by 


h  =  piP)  :=  sup{ |(</9, -0)|  :  ip,rl)  e  V,  <p  ±  if}. 

The  first  results  on  this  general  circle  of  problems  centered  on  sparse  recovery.  Tropp 
|'7j  proved  that  whenever  the  dictionary  has  coherence  /i  <  then  n  steps  of  OMP 

recover  any  /  e  £n  exactly. 

Concerning  instance  optimality,  we  mention  that  Livschitz  [5]  proved  that  whenever 
H  <  then  after  2 n  steps,  the  OMP  algorithm  returns  f2n  £  S2n  such  that 

Wf-f2n\\  <3an(f)n.  (1.10) 

A  weaker  assumption  on  a  dictionary,  known  as  the  Restricted  Isometry  Property 
(RIP),  was  introduced  in  the  context  of  compressed  sensing  [1J .  To  formulate  this  property, 
we  introduce  the  notation 

$c  =  ^c7¥37)  (1.11) 

7er 

whenever  c  =  (c7)7<=r  is  a  finitely  supported  sequence.  The  dictionary  V  is  said  to  satisfy 
the  RIP  of  order  n  G  N  with  constant  0  <  S  <  1  provided 

(1  -  A) ||c|||2  <  ||<I>c||2  <  (1  +  A) ||c|||2,  1 1 c 1 1 £o  :=  #(suppc)  <  n.  (1.12) 

Hence  this  property  quantifies  the  deviation  of  any  subset  of  cardinality  at  most  n  from 
an  orthonormal  set.  We  denote  by  Sn  the  minimal  value  of  6  for  which  this  property  holds 
and  remark  that  trivially  Sn  <  Sn+ 

It  is  well-known  that  a  coherence  bound 

//(£>)  <  (n  —  1)_1  (1.13) 

implies  the  validity  of  RIP(n)  for  Sn  <  (n  —  1  )fi,  but  not  vice  versa  [7]. 

In  [S],  Tong  Zhang  proved  that  OMP  exactly  recovers  finite  dimensional  n-sparse 
signals,  whenever  the  dictionary  T>  satisfies  a  Restricted  Isometry  Property  (RIP)  of  order 
An  for  some  constant  A,  and  that  the  procedure  is  also  stable  in  i2  under  measurement 
noise.  The  main  result  of  the  present  paper  is  the  following  related  theorem  on  instance 
optimality  for  WOMP. 

Theorem  1.1  Given  the  weakness  parameter  k  <  1,  there  exist  fixed  constants  A,C,S*, 
such  that  the  following  holds  for  all  n  >  0;  if  V  is  a  dictionary  in  a  Hilbert  space  %  for 
which  RIP  ((A  +  l)n)  holds  with  (5(^+1)n  <  5* ,  then,  for  any  target  function  f  6  %,  the 
WOMP  algorithm  returns  after  An  steps  an  approximation  f^n  t°  f  that  satisfies 

11/  -  /toll  <  C<7„(/)«.  (1-14) 
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The  values  of  A,  C ,  k,  and  6*  for  which  the  above  result  holds  are  coupled.  For 
example,  it  is  possible  to  have  a  smaller  value  of  A  at  the  price  of  a  larger  value  of  C  or 
of  a  smaller  value  of  6*.  Similarly,  a  smaller  weakness  parameter  k  can  be  compensated 
by  increasing  A. 

While  the  theorem  of  [8]  is  not  stated  in  the  above  form,  it  can  be  used  to  derive 
Theorem  11.11  by  interpreting  the  error  of  best  n-term  approximation  as  a  measurement 
noise.  In  this  way,  one  version  of  the  above  result  can  be  derived  from  [8]  for  OMP  (k  =  1) 
with  5*  —  t;  and  A  =  30.  Let  us  mention  that  Zhang’s  theorem  is  also  established  in  [5], 
with  the  same  proof,  but  with  different  constants  h*  =  |  and  A  =  12. 

In  what  follows,  we  do  not  focus  on  improving  the  constants,  but  rather  our  interest 
is  to  provide  a  conceptually  more  elementary  proof  for  Theorem  11.11  Namely  the  proof 
for  0  and  [4J  is  based  on  an  induction  argument  which  involves  an  auxiliary  greedy 
algorithm  (initialized  from  a  non  trivial  sparse  approximation)  in  an  inner  loop.  Our 
proof  avoids  using  this  auxiliary  step.  It  is  also  presented  in  the  framework  of  a  possibly 
infinite  dimensional  Hilbert  space  Li.  We  give  the  new  proof  in  the  following  section.  We 
then  give  some  observations  that  can  be  derived  from  Theorem  11.11 

In  this  paper,  we  shall  sometimes  use  the  notation  $*w  =  ((u,  <^7))7Gr  f°r  any  v  ELi, 
and  cT  to  denote,  for  any  c  =  (c7)7gr  and  T  C  T,  the  sequence  whose  entries  coincides 
with  those  of  c  on  T  and  are  0  otherwise. 


2  Proof  of  Theorem  11.11 

In  this  section,  we  give  a  proof  of  Theorem  11.11  We  begin  with  the  following  elemen¬ 
tary  lemma  which  guarantees  the  existence  of  near  best  n  term  approximations  from  a 
dictionary. 

Lemma  2.1  Let  T>  be  a  dictionary  in  a  Hilbert  space  H  that  satisfies  RIP(2n).  Then, 

(i)  the  set  En  of  all  n-term  linear  combinations  from  T>  is  closed  in  Li. 

(ii)  For  each  f  E  Li,  £  >  0,  and  n  >  1,  there  exists  a  g  G  En  such  that 

\\f  -  g\\<(l  +  e)an(f)n.  (2.1) 

Proof:  To  prove  (i),  we  let  (gk)k> o  be  a  sequence  of  elements  from  En  that  converges  in 
Li  towards  some  g  E  Li.  We  may  write 

gk  =  *  cfe  =  5»7,  (2.2) 

7er 

with  ||  H^o  <  n.  For  any  e  >  0,  there  exists  K  such  that 

\\gk-gl\\<e,  k,l>K.  (2.3) 

From  RIP(2n),  it  follows  that 

||cfc-cbu  <  -=L=  ,  (2.4) 

\/l  —  ^2  n 
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which  shows  that  the  sequence  (cfc)fc>0  converges  in  </}  to  some  c  G  I2.  In  particular,  we 
find  that 

lim  ck  =  c7,  7  G  T.  (2.5) 

/c— >■+ oo  J 

If  c7  7^  0  for  more  than  n  values  of  7,  we  find  that  ||cfc||£0  >  n  for  k  sufficiently  large  which 
is  a  contradiction.  It  follows  that  g  =  ^7er  c7^7  £  Rn- 

To  prove  (ii),  let  gk  G  Sn  be  such  that  \\gk  -  f\\  — >■  an(f)H.  If  an(f )  >  0,  then  .9  =  gk 
will  satisfy  (ii)  if  k  is  sufficiently  large.  On  the  other  hand,  if  crn(f)  =  0,  then  gk  — >■  /, 
k  — y  00.  By  (i)  /  G  and  so  we  can  take  g  —  f .  □ 

2.1  Reduction  of  the  residual 

Our  starting  point  in  proving  Theorem II .11  is  the  following  lemma  from  [8]  which  quantifies 
the  reduction  of  the  residuals  generated  by  the  WOMP  algorithm  under  the  RIP  condition. 
In  what  follows,  we  denote  by 

Sk  :=  {71,  -  -  -  ,7*},  (2-6) 

the  set  of  indices  selected  after  k  steps  of  WOMP  applied  to  the  given  target  element 
f  E  Pi,  and  denote  as  before  the  residual  by  rk  =  /  —  fk. 

Lemma  2.2  Let  (fk)k> 0  be  the  sequence  of  approximations  generated  by  the  WOMP  al¬ 
gorithm  applied  to  f ,  and  let  g  =  <hz  with  z  supported  on  a  finite  set  T .  Then,  ifT  is  not 
contained  in  Sk,  one  has 

\\rk+ ill2  <  IMI2  - max{°>  IMI2  - 11/  - 3ll2K  (2-7) 

where  5  :=  5#(Tusk)  is  the  corresponding  RIP-constant  and  n  G]0,1[  is  the  weakness 
parameter  in  the  WOMP  algorithm. 

For  completeness,  we  recall  the  proof  at  the  end  of  this  section.  It  is  at  this  point,  we 
depart  from  the  arguments  in  |8]  with  the  goal  of  providing  a  simpler  more  transparent 
argument.  An  immediate  consequence  of  Lemma  [2.21  is  the  following. 

Proposition  2.3  Assume  that  for  a  given  A  >  0  and  S*  <  1,  RIP((A  +  l)n)  holds  with 
8(A+i)n  <  d*.  If  g  =  $z,  where  z  is  supported  on  a  set  T  such  that  #(T)  <  n,  then  for 
any  non-negative  integers  ( j ,  m,  L )  such  that  #(T\Sj)  <m  and  j  +  mL  <  An,  one  has 

117+mill2  <  e-”1(1-i‘>I||r,||2  +  ||/  -  <?||2.  (2.8) 

Proof:  By  Lemma  [2.21  if  g  =  $z  where  z  is  supported  on  a  set  T  such  that  ff(T)  <  n, 
then  for  any  non-negative  integers  (j,  m,  L)  such  that  ff{T  \  Sf)  <  m  and  j  +  mL  <  An, 
one  has 

max{0,  ||rJ+mI||2  -  ||/  -  g||2}  <  (l  -  k2(1  -  i5*)/m)  max{0,  ||7||2  -  |/  -  s||2} 

<  max{0,  ||r3||2  -  ||/  -  S||2}  . 

where  we  have  used  the  fact  that  ff(T  \  Si)  <  m  for  all  l  >  j ,  This  gives  (I2.8[)  and 
completes  the  proof  of  Proposition  12.31  □ 
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Proof  of  Theorem  ll.lt  We  fix  /  and  use  the  abbreviated  notation 


On  :=  on(f)n,  n  >  0. 

We  first  observe  that  the  assertion  of  the  theorem  follows  from  the  following. 


(2.9) 


Claim:  If  0  <  k  <  n  satisfies 


\\rAk\\  <  2cr k, 


and  is  such  that  <rn  <  ^,  then  there  exists  k  <  k'  <  n  such  that 


Wav II  <  2cr k,. 


(2.10) 

(2-11) 


Indeed,  assuming  that  this  claim  holds,  we  complete  the  proof  of  the  Theorem  as 
follows.  We  let  k  be  the  largest  integer  in  {0,  ...,n}  for  which  \WAk\\  <  2  crfc.  Since 
||r0||  =  o o  =  || /||,  such  a  k  exists.  If  k  <  n,  then  we  must  have  ak  <  4 an  and  therefore 

1 1 T An  1 1  <  \W Afc||  <  2 Ok  <  8 CTn,  (2.12) 

so  that  (ll.lip  holds  with  C  —  8. 

We  are  therefore  left  with  proving  the  claim.  For  this,  we  fix 

5*  =  (2.13) 

6 

and  0  <  k  <  n  such  that  (12. 10(1  holds  and  such  that  an  <  ?¥-.  Let  k  <  K  <  n  be  the  first 
integer  such  that  a k  <  By  (ii)  of  Lemma  [2.11  we  know  that  for  any  B  >  1  there  is  a 
g  G  TjK  with  ||/  —  g\\  <  BcrK(f).  Therefore,  g  has  the  form 

g  =  $z  =  #(T)  =  K ■  (2-14) 

7  GT 

The  significance  of  K  is  that  on  the  one  hand 

11/ ~g\\  <  Bok  <  (2.15) 

while  on  the  other  hand 

^  <  4crX-i-  (2.16) 

To  eventually  apply  Proposition  12.31  for  the  above  g  and  j  =  Ak ,  we  need  to  bound 
ff(T  \  SAk )  with  A  yet  to  be  specified.  To  this  end,  we  write  K  =  k  +  M,  with  M  >  0, 
and  observe  that  if  S  C  T  is  any  set  with  ff(S)  =  M  and  gs  then 

Ills'll  >  11/  ~{g-  ^s)||  -  11/  -  g\\  >  Ok  -  BaK  >  (l  -  (2-17) 

where  we  have  used  the  fact  that  g  —  gs  6  L/..  Using  RIP,  we  obtain  the  following  lower 
bound  for  the  coefficients  of  g:  for  any  set  S  C  T  of  cardinality  M 

(i  -  J )A  <  fell2  <  (1  +  y) =  til2-  (2.i8) 

7S S  7GS 
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Taking  for  S  the  set  Sg  of  the  M  smallest  coefficients  of  g  and  noting  that  then  for  any 
more  general  ScT  with  #(S')  >  M,  one  has  ^Y276S  l^l2)  /  (  S7gs9  k7|2j  —  #(S)/M, 

6/  B\*#{S) 


and  hence 


7  1  r 


M 


-4<j: 

7s  S 


(2.19) 


For  the  particular  set  S  :=  T  \  SAk,  if  #(5)  >  M,  the  above  bound  combined  with  the 
RIP  implies 

(1  -  5*)lf  1  -  <  toll2  <  119  -  Sam IP  <  Cliff  -  /II  +  llr^ll)2 


<  (Buk  +  2crfc)2  <  (  f  +  2 )  a 


Since  5*  =  1/6  this  gives  the  bound 


#(T  \  SAk)  <  - 

5 


7  if  +  2 


tM  <  13 M, 


(2.20) 


where  the  second  inequality  is  obtained  by  taking  B  sufficiently  close  to  1. 

We  proceed  now  verifying  the  claim  with  k'  =  K  —  1  when  K  —  1  >  k  and  with 
k'  —  k  +  1  otherwise.  In  the  first  case  we  can  use  the  reduction  estimate  provided  by 
Proposition  12.31  with  j  =  Ak  in  combination  with  (12.161)  to  deal  with  the  term  ||r\4fc||  in 
(12.81).  When  K  =  k  +  1,  however,  we  cannot  bound  ||r^fc||  directly  in  terms  of  a  ai  for 
some  /  >  k.  Accordingly,  we  use  Proposition  12.31  in  different  ways  for  the  two  cases. 

In  the  case  where  M  >  2,  i.e. ,  K  —  1  >  k,  we  apply  (12. 8 j)  with  j  =  Ak ,  m  =  13 M 
and  L  =  |~4k"2].  Indeed  Ak  +  Lm  =  Ak  +  52 M  <  An  holds  for  k  +  M  <  n  whenever 
A  >  52k~2.  Moreover,  notice  that  for  such  an  A 

A(K  —  1)  =  Ak  +  A(M  —  1)  >  Ak  +  7^^  =  ^-k  +  =  Ak  +  Lm,  (2-21) 

whenever 

A  >  26  [4k-2].  (2.22) 

This  gives 

||  ^ A(A'— 1)  || 2  <  ||^Afc+Lm||2 

<^3||rAi||2+||/-s||2 

<  e-10/34ffj  +  B2a2K 

<  +  B2ct2k_1 

—  ^aK-l> 

where  we  have  used  (12.161)  in  the  fourth  inequality,  and  the  last  inequality  follows  by 
taking  B  sufficiently  close  to  1.  We  thus  obtain  (12. lip  for  the  value  k'  =  K  —  1  >  k. 

In  the  case  M  —  1,  i.e.,  K  —  k  +  1,  we  apply  (j2.8j)  with  j  =  Ak,  m  =  13  and 


L=  \6k~2~\  . 


In  fact,  from  (12.201)  we  know  that  #(T  \  SAk )  <  13  and  An  >  A(k  +  1)  >  Ak  +  mL  for  A 
satisfying  (12.221).  This  yields 

\\rA(k+i)\\2  <  \\rAk+mL\\2 

<e-5\\rAk\\2  +  \\f-g\\2 
<  4 e-V|  +  B*al+l 

This  implies  that  SA(k+ 1)  contains  T .  Indeed,  if  it  missed  one  of  the  indices  7  £  T,  then 
we  infer  from  the  RIP, 

(1  -  <5*)|^7|2  <  \\g  -  fA(k+i)\\2 

<  (\\f-g\\  +  llRi(fc+i)||)2 

<  ( BaK  +  \J 4e-5  + 

<(!  +  v/4e-s  +  f)Vj. 

On  the  other  hand,  we  know  from  (12.191)  that 

-(l-T)  fff<|z,|2,  (2.23) 

which  for  R  sufficiently  close  to  1  is  a  contradiction  since  |  ^1  — ^  j  >  |  4e-5  +  . 

This  implies  that  ||rJ4(fc+1)||  <  and  therefore  (12. lip  holds  for  the  value  k!  —  k  +  1. 

This  verifies  the  claim  and  hence  completes  the  proof  of  Theorem  11.11  □ 

Let  us  observe  that  Theorem  o  does  not  give  that  fn  is  a  near-best  n-terrn  approxi¬ 
mation  in  the  form 

11/ -Ml  <C0an(f)n.  (2.24) 

However  a  simple  postprocessing  of  f  An  by  retaining  its  n  largest  components  does  satisfy 

(M). 

Corollary  2.4  Under  the  assumptions  of  Theorem\l.l[  let  fAn  —  <hcj4ri  be  the  output  of 
WOMP  after  An  steps.  Let  T  C  T,  #(T)  =  n,  be  a  set  of  indices  corresponding  to  n 
largest  entries  of  cAn.  Define  f*  £  £n  to  be  the  element  obtained  by  retaining  from  fAn 
only  the  n  terms  corresponding  to  the  indices  in  T.  Then, 

\\f-f:\\<C*an(f)H,  (2.25) 

where  the  constant  C*  depends  on  the  constant  C  in  Theorem\l.l\  and  on  the  RIP-constant 

^(A+l)n  • 

Proof:  By  Lemma  [2. 11  there  exists  a  c  with  1 1 c 1 1 £0  <  n,  such  that 

|/-$c||  <  2an(f)n.  (2.26) 
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It  follows  that 


1 


C  +  2 


|_  r.An\\  s' 
C  —  C  \\p  <- 


\A  -  M+l)n 

If  S  —  supp(c),  we  obtain 


|$c  —  $c 


An\ 


\P  < 


a/1  -  <Wl)i 


=  ^n(/) 


|c  —  C^n||£2  <  ||cT  -  Cyn||^2  +  ||cTe  —  Cyc  ||f2  +  ||c^”||^2 

<  2 1|  c  —  cAn|k2  +  ||cg?||^2 

<  3 1|  c  —  cAn||^2, 


which,  by  (12.27(1.  provides 


|„  „An||  ^  q||„  „.4n||  ^  3(C  +  2)  ^  /  f\ 

|c  Crp  1 1  £2  <  O  ||  C  C  1 1  £2  <  —  =  &n\J ) 


\A — 


n- 


(A+l)n 


The  approximation  $c;yn  is  in  En  and  satisfies 


(2.27) 


(2.28) 


(2.29) 


||/  -  $41  <  2<rn(/)w  +  ||$«  -  c)||  <  (2  +  ^  +  W)n(C+^\  (/)^  (2  30) 

\  a/  1  —  Of  4_li / 

which  proves  (I2.25p. 


(A+l)n 


Proof  of  Lemma  12.21  We  may  assume  that  ||rfc||  >  \\f  —  g\\  otherwise  there  is  nothing 
to  prove.  First  observe  now  that 

llrjfc+ill2  =\\f-Pk+if\\2 

=  \\f-pkf\\2-\\(pk-pk+1)f\\2 
<\\rk\\2-\(rk,Vlk+1)\2. 

Therefore,  it  suffices  to  prove  that  ||rfc||2  —  |(rfe,  <^7fc+1)|2  is  bounded  by  the  right  hand  side 
of  (12.7|)  which  amounts  to  showing  that 

(1  -  5)(IN|2  -|| /  -  S||2)  <  «"2#(T  \  Sk)\(rk,  <plk+1)\2.  (2.31) 

To  prove  this,  we  first  note  that 

2|| 9  -  AII  VIIr.ll2- ||/ -J||2  <  ||J  -  Af  +  INI2  -  ||/  -  j||2 

=  lb-AII2  +  IM2 -lls-A-r.il2 
<  2| (g  -  fk,rk) I  =  2|(0,rfe)|. 


This  is  the  same  as 


\\f-g\\2< 


\(9,rk)\ 2 

lb -Ml2' 


(2.32) 
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If  we  write  =  $cfc,  with  ck  supported  on  5*.,  then  the  numerator  of  the  right  side 
satisfies 

Ks,rfc)|  =|(^z,rfc)| 

=  \(zsck,$*rk)p\ 

—  Ilz5fcll^1ll<^*rfc|k°° 

<  K-1||z5c||^|(rfc,^7fc+1)| 

<  k-1  v/#(r\S'fc)||z5c||^| (rk,  <pyk+1)\ 

<k-V#(T  \5fc)||z  -  cfc||£2|(rfe,^7fc+1)|. 

On  the  other  hand,  recalling  that  <5  =  <5#(sfcUT),  the  denominator  satisfies  by  the  RIP, 

h  -  /fell2  =  ll$(z  -  cfc)||2  >  (1  -  <$)||z  -  C^|||2.  (2.33) 


Therefore  we  have  obtained 


kfeir-n/-5ir< 


#(r\gfc)|(rA,,^7fc+i)p 

k2{1  —  5) 


which  is  firm 


(2.34) 

□ 
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